{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Tutorial 13: Running RLlib experiments on EC2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This tutorial walks through how to run RLlib experiments on an AWS EC2 instance. This assumes that the machine you are using has already been configured for AWS (i.e. `~/.aws/credentials` is properly set up). We HIGHLY RECOMMEND the following as prior reading, as this will be something of an abridged version: https://github.com/ray-project/ray/blob/master/doc/source/autoscaling.rst\n",
    "\n",
    "\n",
    "While going through the above documentation, you can ignore all instructions on GCP, as GCP will not be covered in this tutorial."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## A brief description of ray_autoscale.yaml\n",
    "\n",
    "This section explains the most salient components of `/flow/scripts/ray_autoscale.yaml`. We'll go over some of the variables you should change, as well as those that might come in handy for you. A more detailed guide is on deck soon.\n",
    "\n",
    "* `cluster_name`: (CHANGE ME!) A unique identifier for the head node and workers of this cluster. If you want to set up multiple clusters, `cluster_name` must be changed each time the script is run.\n",
    "* `AMI`: This specifies which AMI to launch this instance with. We provide a pre-built AMI for usage in flow/ray_autoscale.yaml which is available for usage. For further reading, please check out: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AMIs.html\n",
    "* `setup_commands`: This describes the set of commands to run after the instance is up and running in the $HOME directory. Commands can vary widely. If you're running experiments, you're most likely on a branch that is not 'master'. This is the right place to specify which branch you want to sync to EC2. \n",
    "    * To specify a branch from the main flow-project repo will look something like: \n",
    "    \n",
    "    `cd flow && git pull && git checkout [YOUR-BRANCH-HERE]`\n",
    "    * To specify a branch from your fork, the command will look something like this:\n",
    "    \n",
    "    `git remote add [USER] https://github.com/[USER]/flow && git fetch [USER] && git checkout [USER] [YOUR-BRANCH-HERE]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup and run clusters\n",
    "\n",
    "1. Once your yaml file is properly configured, start the cluster with:\n",
    "\n",
    "    `ray up ray_autoscale.yaml -y`\n",
    "    * The -y flag is optional, it simply indicates 'yes' to all follow-up questions\n",
    "    \n",
    "2. Use the `ray exec` command to communicate with your cluster. \n",
    "\n",
    "    `ray exec ray_autoscale.yaml \"flow/examples/stabilizing_the_ring.py\"`\n",
    "    * For a list of options you can provide to this command which will enable a variety of helpful options such as running in tmux or stopping after the command completes, view the link at the beginning of this tutorial.\n",
    "    \n",
    "3. Attach to the cluster via `ray attach`.\n",
    "    \n",
    "    `ray attach ray_autoscale.yaml -y`\n",
    "    * This ssh's into the cluster.\n",
    "    \n",
    "    \n",
    "Note that the above steps 2-3 can become tedious if you create multiple clusters, and thus there are many versions of ray_autoscale.yaml lying around. For further explanation, read on: ray commands identify clusters according to the cluster_name attribute in ray_autoscale.yaml, so if you create 'test_0', test_1', 'test_2', 'test_3', and 'test_4' by simply erasing 'test_0' and replacing it with 'test_1', and so on, you would have to manually change the cluster_name in ray_autoscale.yaml to specify which cluster you intend to interact with while using `ray attach`, `ray exec`, or other `ray` commands. An alternative is this: when the cluster is created i.e. after `ray up ray_autoscale.yaml -y` is successful, it returns a ssh command to connect to that cluster's IP directly. When running multiple clusters, it can be useful to save these ssh commands.\n",
    "\n",
    "Note note, that a helpful, streamlined method of starting and executing a cluster in one fell swoop can be done via: <br />\n",
    "4. `ray exec ray_autoscale.yaml flow/examples/stabilizing_the_ring.py --start`\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Run Experiments\n",
    "\n",
    "Steps 2 and 4 from the previous section indicate how one may begin RLlib experiments in EC2. This section goes over some caveats to consider while running experiments.\n",
    "\n",
    "* tmux: Running experiments in tmux within the cluster is highly recommended, as this allows you keep the process running in the background whil you ssh out of the cluster or move around within the cluster. This can be achieved by supplying the `ray exec` command with the `--tmux` flag.\n",
    "    - Or if you want to create a tmux session manually: \n",
    "        - To create a new session: `tmux new [-s] [SESSION_NAME]`\n",
    "        - To list all sessions: `tmux ls`\n",
    "        - To attach to the most recently created session: `tmux a`\n",
    "            - `tmux a #` if multiple sessions exist\n",
    "        - To attach to a specific session: `tmux a -t [SESSION_NO]`\n",
    "        - To detach from a session: ctrl-b + d\n",
    "        - To kill a session: `tmux kill-session -t [SESSION_NO]`\n",
    "        - To scroll within the session: ctrl-b + \\[\n",
    "            - To exit scroll mode: `q`\n",
    "            \n",
    "* Information about managing results: As usual, ray results will be automatically written to /$HOME/ray_results. To upload these results to Amazon s3, you should configure this step before running the experiment. An argument should be included within flow_params in the runner script (i.e. stabilizing_the_ring.py) in the following fashion (note the # CHANGE ME!!! comment):\n",
    "\n",
    "```\n",
    "if __name__ == \"__main__\":\n",
    "    alg_run, gym_name, config = setup_exps()\n",
    "    ray.init(num_cpus=N_CPUS + 1)\n",
    "    trials = run_experiments({\n",
    "        flow_params[\"exp_tag\"]: {\n",
    "            \"run\": alg_run,\n",
    "            \"env\": gym_name,\n",
    "            \"config\": {\n",
    "                **config\n",
    "            },\n",
    "            \"checkpoint_freq\": 20,\n",
    "            \"checkpoint_at_end\": True,\n",
    "            \"max_failures\": 999,\n",
    "            \"stop\": {\n",
    "                \"training_iteration\": 200,\n",
    "            },\n",
    "            upload_dir: 's3://path/to/your/bucket', # CHANGE ME!!!\n",
    "        }\n",
    "    })\n",
    "```\n",
    "\n",
    "        "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Close Clusters\n",
    "\n",
    "When you are done with the experiment, it's time to close the cluster. There are a few ways to do this.\n",
    "\n",
    "* ray down ray_autoscale.yaml -y\n",
    "* Go to your EC2 instance console and terminate the desired instance\n",
    "* Run your cluster command with the `--stop` option, so that the cluster will terminate once the command is complete."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
